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ABSTRACT 

The present paper discusses and Illustrates the problems with using 
stepwise analytic methods and Illustrates better alternatives to 
these methods. To make the Illustrations concrete, an actual data 
set involving responses by 91 subjects to 30 variables is employed. 
Though data illustrating the problems with stepwise methods in a 
more dramatic fashion can be formulated, these data have the appeal 
of being real. In any case, the emphasis here is on better 
alternatives to stepwise methods, as against the problems, since 
the problems with these methods have been so fully elaborated 
elsewhere. A twc-stage approach to variable selection is 
recommended. Problems with using statistical significance testing 
in conjunction with stepwise methods are also elaborated in some 
detail. 



ERLC 



3 



As Huberty (1989, p. 43) notes, 

The conduct of analytical procedures in ''steps'* is 
quite common. . . Although regression analysis and 
discriminant analysis proMems are, without a doubt, 
the most popular contexts for the use of step-type 
computational algorithms, these approaches have also 
been suggested in multivariate analysis of variance 
(Stevens, 1973) and in canonical correlation 
analysis (Thompson, 1984, pp. 47-51; Thorndi)ce & 
Weiss, 1983) . 

Various researchers have emphatically criticized the use of 
conventional stepwise methods (e.g., Huberty, 1989; Huberty & 
Wisenbaker, in press; Snyder, 1991; Thompson, 1988b, 1989). 

Three major criticisms have been presented. First, 
conventional stepwise methods dramatically inflate Type I error 
rates. Snyder (1991) presents an impressive concrete example of how 
strongly stepwise methods can be influenced by sampling error. One 
reason why stepwise methods "are positively satanic in their 
te iptation toward Type I errors" (Cliff, 1987, p. 185) involves the 
fact that computer programs use the wrong denominator degrees of 
freedom and sum-of -squares in their calculations. Indeed, so do 
most books, with the notable exception of Keppel and Zedeck's 
(1989, pp. 402-405) recent offering, as suggested by Thompson (in 
press) . 

Second, the variables identified after li steps of analysis may 
not include all or even any of the variables in the best predictor 
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set of size For example, In a prob?en Involving 10 predictor 
variables, variables A and B may be entered In the first two steps, 
but the best predictor 3et of size ]s;«2 for the same data may well 
be variables C and D. It^lrd, order -)f entry provides very limited 
Information regarding the relative Importance of the variables, as 
Huberty (1989) explains In more detail. 

The purpose of the present paper Is to discuss and Illustrate 
these problems and to Illustrate better alternatives. Those who 
would like more detail regarding these Issues are urged to consult 
Snyder (1991). To make the Illustrations concrete, an actual data 
set Involving responses by 91 subjects to 30 variables Is employed. 
Though datti that Illustrate the problems with stepwise methods In 
a more dramatic fashion can be formulated, these data have the 
appeal of being real. In any case, the emphasis here is on 
alternatives to stepwise methods, as against the problems, since 
the problems wivh these methods have been so fully elaborated 
elsewhere. 

Discriminant analysis is used as the analytic method in the 
present heuristic example. However, all analytic methods are 
oorrelational and are related (Knapp, 1978; Thompson, 1988a), ^nd 
therefore the present discussion generalizes to other stepwise 
methods, e.g., stepwise regression as well. 

The Heuristic Data Set 

The 30 variables involved perceptions of barriers to education 
in medical schools with respect to medical students' 
characteristics. The variables were developed using a delphi study, 
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an approach that has proven useful in previous Instrument 
development efforts (e.g., Lester & Thomson, 1989; Thomson & 
Ponder, 1979) . The delphi process that ultimately produced the 30 
items involved a national invitational conference; 34 professionals 
participated based on being nominated by one of the four sponsoring 
organizations (the Josiah Macy, Jr., Foundation, the Rockefeller 
Foundation, the Robert Wood Johnson Foundation, and the Baylor 
College of Medicine) as individuals vho had had significant 
leadership roles in promoting minority citizens* access to careers 
in the health professions. The delphi study resulting in the 
isolation of the 30 items is described in Baylor College of 
Medicine (1986) . 

For the purposes of the present study admissions officials at 
144 medical schcols in the United States and Canada were asked to 
rate extent of agreement with the 30 statements using 1 to 5 (5 » 
strong agreement) Likert scales. Admissions officials from 58 
schools (40.3%) returned completed questionnaires after the first 
mailing, and an additional 33 officials (22.9%) completed 
cpiestionnaires sent in a follow-up mailing to the admissions 
officials at the 86 schools not responding to the first mailing. 
Thus, representatives from 91 out of 144 medical schools completed 
questionnaires, and the response rate was 63.2%. This responses rate 
was considered acceptable, especially given that the average 
response rate in survey research is typically about 33% (Kerlinger, 
1986). 

The 91 admissions officials rated extent of agreement that 



each of the 30 statements involved problens encountered by 
elementary and secondary educators working with each of four 
referent student populations: (a) black students, (b) hispanic 
students, (c) other minority students, and (d) non-minority 
students. The focus of the analysis in the present study was on 
explaining variance in perceptions of the four referent groups. 

Initially itejis not explaining an appreciable portion of 
variance in perceptions of the four referent groups were deleted. 
The 10 variables with the smallest effect sizes (expressed ss 
Wilks' lambda or one minus the ANOVA correlation ratio) with 
respect to discriminating the four referc»nt groups were omitted: 
variables 29, 24, 16, 4, 21, 14, 15, 26, 20, and 30, respectively. 
The mean lambda for these 10 variables was 98. 1\ (SD'1.6%). Thu3, 
on these average the four referents explained only about 2% of the 
variance in each of these 10 predictors. 

Classica l Stepwise Results 
The first analysis involved conducting a conventional stepwise 
discriminant analysis with the four referent student groups as the 
dependent variable and the 20 remaining statements as predictor 
variables. Variables were only added in this stepwise analysis for 
these data (for some data the stepwise algorithm will also delete 
variables at certain steps), and 13 variables were entered: 
variables 1, 7, 2, 8, 10, 17, 6, 22, 12, 13, 18, 25, and 5, 
respectively. The f-to-enter for each of the remaining seven 
variables were all less than one, so the improvement in the model 
resulting from adding any of these variables would not have been 
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statistically sigr if leant. In conventional fashion, the stepwise 
analysis was terminated at this point. 

Table 1 presents the variables entered at each of the 13 
steps, and the associated lanbda effect sizes. Lambda is similar to 
in that it is an effect size function, and ranges between zero 
and one. However, the largest effect size for £^ is one, while the 
largest effect size for lambda is zero, i.e., the two estimates are 
Inversely related. 

INSERT TABLE 1 ABOUT HERE. 

A Better Alternative to Stepwise 
If one must eliminate variables, a better procedure than 
stepwise is to compute the effect size for every possible predictor 
set of size Is^l, size size K»3, and so forth. This sounds 

tedious, but can be done rapidly, accurately, and painlessly by 
readily available computer software. In the multiple regression 
case, the SAS program PROC RSQUARE is available. For discriminant 
analysis applications, the FORTRAN program written by McCabe (1975) 
is available. 

Table 2 presents the lambda effect size for the best predictor 
variable combinations for sizes K « 1 through 12, as computed by 
McCabe *s program. The program actually presents the lambdas for 
several variable combinations at various values of but only the 
best combination for each size is presented here. 
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INSERT TABLE 2 ABOUT HERE. 



For example, the best pradlctor set of size 1^ ■ 9 Included 
variables l, 7, 2, 8, 10, 17, 6, 13, and 12 (lambda « .4100). This 
example makes clear that stepwise methods do not isolate the best 
predictors for a given variable set size K# since variable 22 is 
entered in the eighth step of the stepwise analysis, but is not 
part of the best predictor set of size 1^ " 9. That is, the stepwise 
analysis wrongly indicates that the best predictor set of size 1^ » 
9 includes variables 1, 7, 2, 8, 10, 17, 6, 22, and 12. 

The better selection of predictors is made in a two-*stage 
process in which viuriable selection is not conditioned upon the 
results in previous steps. Stepwise results are conditioned in this 
manner, e.g., if variable 6 had not been entered in step 7, then a 
variable other than 22 might well have been selected in step 8. 
Stepwise methods have the disadvantage of being tied to the limited 
context of the variables in the study and previously entered in the 
analysis. This limits the generalizability of conclusions, since 
the results are conditional upon the context of previous entries. 

The first step of the procedure endorsed here is to initially 
determine the desired size, of the predictor variable set This 
can be done by computing the changes in lambda (or in in 
regression analysis) as new predictors are added, or by plotting 
lambda in a ** scree** plot fashion, as illustrated in Figure 1. For 
the data in hand the optimal predictor set size appears to be size 
7, since the addition of other variables results in relatively 
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negligible contributions to predictive power. 



INSERT FIGURE 1 ABOUT HERE. 

The second step in the two-stage process is to select the best 
predictor set of the selected size, ](. The effect size should be 
consulted for this purpose. However, this tends to be just as 
••atheoretical'* and ••mechanical'' as conventional stepwise methods 
(Keppel & Zedeck, 1989, pp. 398, 407). A better approach is to then 
select the predictor set based on theory or previous empirical 
results, or based on the accessibility of the variables in a given 
set. 

Cliff (1987, pp. 120-121) notes that "a large proportion of 
the published results using this [stepwise] method probably present 
conclusions that are not supported by the data.^^ As conventionally 
applied in regression and discriminant analysis, stepwise 
applications usually create serious problems. 

One particular problem with stepwise analyses involves the 
propensity of researchers to apply statistical significance tests 
to evaluate how many steps to implement, as in the 13 step example 
presented here. Additional problems with statistical significance 
tests have been elaborated in detail elsewhere (Carver, 1978; 
Thompson, 1987; Welge-Crow, LeCluyse & Thompson, 1990), but this 
aspect of the problem may warrant some explanation. 

Science is the business of creating and cumulating knowledge. 
This becomes possible only when results are reasonably 
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coiQAensurable across studies. The problea can be illustrated with 
a serieii of hyi>othetical regression studies, each involving four 
predictor variables. 

Say foxir researchers conduct identical studies, but with t>>ree 
separate samples of subjects, each varying in size (Oi ■ 100, 112 * 
18, JI3 « 16, 1I4 « 15). Say also that the researchers have exactly 
identical results with respect to the bivariate correlation 
matrices from which the regression results are extracted. Table 3 
presents results that fit this description. 

INSERT TABLE 3 ABOUT HERE. 

For the Table 3 data, researcher one will conduct four steps 
of analysis, interpreting results involving an effect size of ^ 
70% (I » 54.42, j|£ " 4/95, £ < .05) and predictors A, B, C, and D. 
Researcher two will conduct three steps of analysis, interpreting 
results involving an effect size of * 60% (£ « 7.00, d£ » 3/14, 
S < .05) and predictors A, B, and C. Researcher threc; will conduct 
two steps of analysis, interpreting results involving an effect 
size of fi^ « 45% (£ » 5.32, d£ » 2/13, c < .05) and predictors A 
and B. Researcher four will conduct one step of analysis, and 
conclude that an effect size of B^ = 45% (£ = 4.67, si£ * 1/13, p > 
.05) is not statistically significant and that nfi predictors are 
useful • 

Yet all these divergent interpretations are based on exactly 
the same correlation matrix, and emerge solely as an artifact of 
the use of statistical significance testing in conjunction with 
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st«pwis« analysis t "Unfortunately," as Pedhazur (1982, p. 168) 
notas, "social ssienca research is replete with aisinterpretations 
of this kind." 
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Tabl« 1 

Stepwise DlscrlBlnant Analysis Results 

(X - 20) 



Step Variable Added lambda 

1 Ql .728 

2 Q7 .603 

3 Q2 . 542 

4 Q8 .497 

5 aiO .467 

6 Q17 .439 

7 Q6 .426 

8 — Q22 — .417 

9 Q12 .411 

10 Q13 .402 

11 Q18 .397 

12 Q25 .392 

13 Q5 .388 



Table 2 

The Best Predictors for Variables "^ets of Size « 1 to 12 
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Tabl« 3 



Blvarlate c Matrix and Stapvlse fs for Four Sanple Sizes 
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Note . From Thompson (1991), with permission. 
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Figure 1 
Plot of Alanbda in "scree" Form 
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